Goto

Collaborating Authors

 Karlsruhe





Learning Superconductivity from Ordered and Disordered Material Structures Pin Chen

Neural Information Processing Systems

However, some critical aspects of it, such as the relationship between superconductivity and materials' chemical/structural features, still need to be understood. Recent successes of data-driven approaches in material science strongly inspire researchers to study this relationship with them, but a corresponding dataset is still lacking.


The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

Neural Information Processing Systems

Such embeddings induce the so-called maximum mean discrepancy (MMD; [Smola et al., 2007, Gretton et al., 2012]), which quantifies the discrepancy Many estimators for HSIC exist. The classical ones rely on U-statistics or V -statistics [Gretton et al., 2005, Quadrianto et al., 2009, Pfister et al., 2018] and are known to converge at a rate of Lower bounds for the related MMD are known [Tolstikhin et al., 2016], but the existing analysis considers radial kernels and relies on independent Gaussian distributions.






A benchmark of categorical encoders for binary classification

Neural Information Processing Systems

Categorical encoders transform categorical features into numerical representations that are indispensable for a wide range of machine learning models. Existing encoder benchmark studies lack generalizability because of their limited choice of 1. encoders, 2. experimental factors, and 3. datasets. Additionally, inconsistencies arise from the adoption of varying aggregation strategies. This paper is the most comprehensive benchmark of categorical encoders to date, including an extensive evaluation of 32 configurations of encoders from diverse families, with 48 combinations of experimental factors, and on 50 datasets. The study shows the profound influence of dataset selection, experimental factors, and aggregation strategies on the benchmark's conclusions -- aspects disregarded in previous encoder benchmarks.